Skip to content

YohannaWANG/CauchyEst

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

   

Documentation Status Maintenance License

Learning Sparse Fixed-Structure Gaussian Bayesian Networks

This is the implementation of the following paper:

Arnab Bhattacharyya, Davin Choo, Rishikesh Gajjala, Sutanu Gayen, and Wang Yuhao "Learning Sparse Fixed-Structure Gaussian Bayesian Networks." arXiv preprint arXiv:2107.10450 (2021).

Introduction

Gaussian Bayesian networks (a.k.a. linear Gaussian structural equation models) are widely used to model causal interactions among continuous variables. In this work, we study the problem of learning a fixed-structure Gaussian Bayesian network up to a bounded error in total variation distance. We analyze the commonly used node-wise least squares regression LeastSquares and prove that it has the near-optimal sample complexity. We also study a couple of new algorithms for the problem:

  • BatchAvgLeastSquares takes the average of several batches of least squares solutions at each node, so that one can interpolate between the batch size and the number of batches. We show that BatchAvgLeastSquares also has near-optimal sample complexity.
  • CauchyEst takes the median of solutions to several batches of linear systems at each node. We show that the algorithm specialized to polytrees, CauchyEstTree, has near-optimal sample complexity.

Experimentally, we show that for uncontaminated, realizable data, the LeastSquares algorithm performs best, but in the presence of contamination or DAG misspecification, CauchyEst, CauchyEstTree and BatchAvgLeastSquares respectively perform better.

Example

Prerequisites

Contents

  • Data - Real Bayesian network data from bnlearn
  • data.py - synthetic DAG data, including graph simulation and data simulation. Load real Bnlearn data
  • evl.py - algorithm accuracy evaluation based on KL-divergence
  • config.py - Set parameters for Bayesian network (eg. node number, graph degree, sample size)
  • methods.py - the implementation of all algorithms
  • utils.R - load bnlearn graph; Run CLIME algorithm
  • main.py - Run some examples of our algorithm

Parameters

Parameter Type Description Options
n int number of nodes -
s int number of samples -
d int average degree of node -
ill int number of ill conditioned nodes -
batch int number of batch size -
load str load synthetic or real data syn, real
choice str choice which real data ecoli70, magic-niab, magic-irri, arth150
tg str type of a graph chain, er, sf, rt
tn str type of noise ev, uv, ca, ill, exp, gum

Running a simple demo

The simplest way to try out CauchyEst is to run a simple example:

$ git clone https://github.com/YohannaWANG/CauchyEst.git
$ cd CauchyEst/
$ python CauchyEst/main.py

Runing as a command

Alternatively, if you have a CSV data file X.csv, you can install the package and run the algorithm as a command:

$ pip install git+git://github.com/YohannaWANG/CauchyEst
$ cd CauchyEst
$ python main.py --n 100 --d 5 --tg 'er' --tn 'uv' 

Algorithms

  • #f03c15 Algorithm 1 states our two-phased recovery approach. We estimate the coefficients of the Bayesian network in the first phase and use them to recover the variances in the second phase. characterization
  • #c5f015 Algorithm 2 is the variance recovery algorithm given coefficient estimates. characterization
  • #1589F0 Algorithm 3 is recovering the coefficients in a Bayesian network using a linear least squares estimator. characterization
  • #c5f015 Algorithm 4 is a generalization of the least square algorithm. characterization
  • #d03c15 Algorithm 5 is a coefficient recovery algorithm based on Cauchy random variables (for variable with p parents). characterization
  • #1589F0 Algorithm 6 is a coefficient recovery algorithm based on Cauchy random variables (for general Bayesian networks). characterization
  • #f03c15 Algorithm 7 works for the special case of polytree Bayesian networks. characterization

Performance

100 nodes, degree 5, ER graph Noisy data(5% noisy sample, 5/100 noisy nodes), d=5, ER graph
characterization characterization

Citation

If you use any part of this code in your research or any engineering project, please cite our paper:

@article{bhattacharyya2021learning,
  title={Learning Sparse Fixed-Structure Gaussian Bayesian Networks},
  author={Bhattacharyya, Arnab and Choo, Davin and Gajjala, Rishikesh and Gayen, Sutanu and Wang, Yuhao},
  journal={arXiv preprint arXiv:2107.10450},
  year={2021}
}

Contacts

Ask Me Anything ! Please feel free to contact us if you meet any problem when using this code. We are glad to hear other advise and update our work. We are also open to collaboration if you think that you are working on a problem that we might be interested in it. Please do not hestitate to contact us!

About

We analyze algorithms to learn Gaussian Bayesian networks with known structure up to a bounded error in total variation distance.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published